Watermark embedding and detection of a motion image signal

ABSTRACT

Methods and arrangements are disclosed for embedding and detecting a watermark in a cinema movie, such that the watermark can be detected in a copy made by a handheld video camera. The watermark embedder divides each image frame into two areas. A watermark bit ‘+1’ is embedded in a frame by increasing the luminance of the first part and decreasing the luminance of the second part. A watermark bit ‘−1’ is embedded by decreasing the luminance of the first part and increasing the luminance of the second part. It is achieved with the invention that the embedded watermark survives ‘de-flicker’ operations that are often used to remove flicker caused by the different frame rates of cinema projection equipment and consumer camcorders.

FIELD OF THE INVENTION

The invention relates to a method and arrangement for embedding a watermark in motion image signals such as movies projected in cinemas. The invention also relates to a method and arrangement for detecting a watermark embedded in such motion image signals.

BACKGROUND OF THE INVENTION

Watermark embedding is an important aspect of copy protection strategies. Although most copy protection schemes deal with protection of electronically distributed contents (broadcasts, storage media), copy protection is also desired for movies shown in theaters. Nowadays, illegal copying of cinema material by means of a handheld video camera is already common practice. Although the quality is usually low, the economical impact of illegal VHS tapes, CD-Videos and DVDs can be enormous. For this reason, cinema owners are obliged to prevent the presence of video cameras on their premises. Not following this rule may be sanctioned with a ban on the future availability of content. In view hereof, it is envisioned to provide that a watermark will be added during show time. The watermark is to identify the cinema, the presentation time, operator, etc.

Robustness to geometric distortions is a key requirement for such watermark embedding schemes. A handheld camera will not only seriously degrade the video by filtering (the optical path from the screen to the camera, transfer to tape, etc.) but also seriously geometrically distort the video (shifting, scaling, rotation, shearing, changes in perspective, etc.). In addition, these geometrical distortions can change from frame to frame.

A prior-art method of embedding a watermark in cinema movies, which meets the robustness requirements, is disclosed in Jaap Haitsma and Ton Kalker: A Watermarking Scheme for Digital Cinema; Proceedings ICIP, Vol. 2, 2001, pp. 487-489. The robustness to geometric distortions is achieved by exploiting only the temporal axis to embed the watermark. The watermark is a periodic pseudo-random sequence of watermark samples having two distinct values, e.g. ‘1’ and ‘−1’. One watermark sample is embedded in each image. The value ‘1’ is embedded in an image by increasing a global property (e.g. the mean luminance) of the image, the value ‘−1’ is embedded by decreasing said global property.

The prior-art watermark embedding method actually embeds flicker. By embedding the same watermark sample in a number of consecutive images, the flicker is made imperceptible (the human eye is less sensitive to low-frequency flicker).

Flicker of the recorded movie is also caused by a) the typical mismatch between the cinema projector's frame rate (24 frames per second) and the camcorder's frame rate (25 fps for PAL, 29.97 fps for NTSC), and b) the difference between the two display scan formats (progressive vs. interlace). This kind of flicker is so annoying that de-flickering tools have been made widely available to the public. For example, a de-flicker plug-in for the video capturing and processing application “Virtualdub” has been found on the Internet.

A problem of the prior-art watermark embedding scheme is that de-flicker tools also remove the embedded watermark.

OBJECT AND SUMMARY OF THE INVENTION

It is an object of the invention to further improve the prior-art watermark embedding and detection method. It is a particular object of the invention to provide a watermark embedding and detection scheme, which is robust to de-flickering operations.

To this end, the method of embedding a watermark in a motion image signal according to the invention includes dividing each image into at least a first and a second image area. One value of a watermark sample is embedded in an image by increasing the global property (e.g. the mean luminance) of its first area and decreasing the global property of its second area. The other value of the watermark sample is embedded in the opposite way, i.e. by decreasing the global property of the first image area and increasing the global property of the second image area.

The invention exploits the insight that de-flickering tools remove flicker by adjusting the mean luminance of successive images to exhibit a low-pass character. The mean luminance is adjusted in practice by multiplying all pixels of an image by the same factor. Because this operation does not affect the (sign of the) modifications applied to the distinct image areas, the watermark information will be retained. In view hereof, the global property of an image area being modified to embed the watermark is the mean luminance of said image area.

In a preferred embodiment of the method, the first and the second image area are the upper and lower half of an image. In general, there are more horizontal than vertical movements in a movie. The horizontal movements influence the mean luminance values to a smaller extent.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic diagram of a watermark embedder in accordance with the invention.

FIG. 2 is a schematic diagram of a watermark detector in accordance with the invention.

FIG. 3 is a schematic diagram of a correlation stage, which is an element of the watermark detector shown in FIG. 2.

FIG. 4 shows graphs of the mean luminance values of an original image sequence and a watermarked image sequence.

DESCRIPTION OF EMBODIMENTS

FIG. 1 is a schematic diagram of a watermark embedder in accordance with the invention. The embedder receives a sequence of images or frames having a luminance F(n,k) at spatial position n of frame k. The embedder further receives a watermark in the form of a pseudo-random sequence w(n) of length N, where w(n)ε[−1, 1]. An appropriate value of N for this application is N=1024. The arrangement comprises a dividing stage 10, which divides each image into a first (e.g. upper half) area and a second (e.g. lower half) area. The luminance of said image areas is denoted F₁(n,k) and F₂(n,k), respectively.

In the simplest embodiment of the watermark embedder, the sequence w(n) is directly applied to embedding stages 11 and 12. In such an embodiment, embedding stage 11 adds one applied watermark sample w(n) to every pixel of the first image area, whereas embedding stage 12 subtracts the same watermark sample from every pixel of the second image area. Clipping is performed where necessary. The mean luminances of the first and second image areas are thus oppositely modulated by the watermark.

Other examples of global image properties that can be modulated by the watermark are picture histograms (a list of relative frequencies of luminance values in the picture), or features derived therefrom such as high order moments (average of luminance values to a power k). The mean luminance is a specific example of the latter (k=1).

Since the Human Visual System (HVS) is sensitive to flicker in low spatial frequencies, this simple embodiment may suffer from artifacts in especially non-moving flat areas. These artifacts are significantly reduced by lowering the flicker frequency of the watermark. This is performed by a repetition stage 13, which repeats each watermark sample during a predetermined number K of consecutive images. The same watermark sample is thus embedded in K consecutive frames. The watermark repeats itself every N=1024 frames. The watermark sample w(n) being embedded in frame k can be mathematically denoted by w(└k/K┘mod N). For simplicity, this expression will hereinafter be abbreviated to w(k).

The preferred embodiment of the embedder which is shown in FIG. 1 further adapts the embedding depth in dependence upon the image contents. To this end, the embedder comprises multipliers 14 and 15, which multiply the watermark sample w(k) by a local scaling factor C_(F,1)(n,k) and C_(F,2)(n,k), respectively. The local scaling factors are derived from the image contents by image analyzers 16 and 17, respectively. For example, they are large in moving textured parts of an area and small in non-moving flat parts. The outputs of the embedding stages 11 and 12 can be formulated as: F _(w,1)( n,k)=F ₁( n,k)+C _(F,1)( n,k)w(k) F _(w,2)( n,k)=F ₂( n,k)−C _(F,2)( n,k)w(k) It will be appreciated that both embedding operations can be carried out in a time-sequential manner by a single processing circuit under appropriate software control.

The two image areas are subsequently combined by a combining stage 18 into a single watermarked image F_(w)(n,k).

FIG. 2 is a schematic diagram of a watermark detector in accordance with the invention. Although the original signal is available during detection, the detector does not utilize any knowledge about the original. The arrangement receives a watermarked sequence of images or frames having a luminance F_(w)(n,k) at spatial position n of frame k. The detector comprises a dividing stage 20, which divides each image into a first (e.g. upper half) area and a second (e.g. lower half) area in a similar manner as dividing stage 10 (FIG. 1) of the embedder. The luminance of each image area is denoted F_(w,1)(n,k) and F_(w,2)(n,k), respectively. For each image area, the detector further includes a mean luminance computing circuit 21, 22, which computes the mean luminance values f_(w,1)(k) and f_(w,2)(k) (or other global property, if applicable) of the respective image areas in accordance with:

${f_{w,i}(k)} = {\frac{1}{N}{\sum\limits_{\underset{\_}{n}}{F_{w,i}\left( {\underset{\_}{n},k} \right)}}}$

In practice, the mean luminance values of a movie exhibit a low-pass nature as a function of the frame number k (i.e. as a function of time). The detector estimates the mean luminance values of the original (unwatermarked) movie by low-pass filtering (25,27) the respective mean luminance values f_(w,1)(k) and f_(w,2)(k). Estimations of the mean luminance modifications as introduced by the embedder are subsequently obtained by subtracting (26,27) the low-pass filtered mean values from the unfiltered mean luminance values. The detector estimates the embedded watermark sample by subtracting (23) both estimations, followed by a sign operation (29). The estimated watermark sample being embedded in frame k is denoted v(k).

The arrangement thus generates a sequence of estimated watermark samples. In a correlation stage 3, the estimated sequence of watermark samples is correlated with the watermark being looked for. The detector receives this watermark being looked for in the form of a pseudo-random sequence w(n) of length N, where w(n)ε[−1, 1]. The detector comprises a repetition stage 24, which is identical to same repetition stage 13 of the embedder. The repetition stage repeats each watermark sample during K consecutive images. The watermark repeats itself every N=1024 frames. The watermark samples being applied to the correlation stage 3 for frame are denoted w(k). Again, w(k) is an abbreviation for the mathematically more correct expression w(└k/K┘modN).

It should be noted that the low-pass filter/subtracter combinations 25,26 and 27,28, as well as the sign operation 29 are optional.

FIG. 3 is a schematic diagram of the correlation stage 3. The estimated watermark samples of successive images are distributed to K buffers 31, 32, . . . , where, as described above, K is the number of consecutive images in which the same watermark sample is embedded. Each buffer stores N estimated watermark samples (or N computed mean luminance values, or N estimated mean luminance modification values). Typical values of the watermark length N and the frames per watermark sample K are 1024 and 5, respectively. Accordingly, the first buffer 31 contains estimated watermark samples v(1), v(6), v(11), . . . , the second buffer 32 contains v(2), v(7), v(12), . . . , etc. This implies that the granularity of watermark detection is approximately 3 minutes and 25 seconds for PAL video.

The watermark is detected by determining the similarity of the contents of each buffer with the reference watermark w(n) being looked for. Each watermark can identify, for instance, one theatre. A well-known example of similarity is cross-correlation, but other measures are possible. The contents of each buffer are cross-correlated with the reference watermark in respective correlators 33,34, . . . The correlation is preferably performed using Symmetrical Phase Only Matched Filtering (SPOMF). For a description of SPOMF, reference is made to International Patent application WO 99/45706. In said document, the correlation is performed in the two-dimensional spatial domain. Blocks of N×N image pixels are correlated with an N×N reference watermark. The result of the SPOMF operation is an N×N pattern of correlation values exhibiting one or more peaks if a watermark has been embedded.

The K correlators 33,34, . . . operate in the one-dimensional time domain. The output of each correlator is a series of N correlation values which is stored in a corresponding one of K buffers 35,36, . . . A peak detector 37 searches the highest correlation value in the K buffers, and applies said peak value to a threshold circuit 38. If the peak value of at least one of the buffers is larger than a given threshold value, it is decided that the watermark is present. Otherwise, the content will be classified as not watermarked.

A payload can be encoded in the signal by embedding shifted versions of the watermark w(n) in a manner similar to the one disclosed in International Patent Application WO-A-99/45705. It should further be noted that, although parallel correlators are shown in FIG. 3, it may be advantageous to carry out the respective operations in a time-sequential manner.

In order to describe the invention in even more details, a mathematical analysis of the prior-art watermarking scheme, an analysis of the publicly available de-flickering tool, and the operation of the watermarking scheme in accordance with the invention will now be given.

The watermark w is a periodic pseudo-random sequence containing only ‘1’ and ‘−1’ sample values with period M. A watermark sample w(n) is embedded in K consecutive frames k, k+1, . . . , k+K−1. By embedding one watermark sample in K consecutive frames, the frequency of the flickering due to the embedding is decreased. A ‘1’ is embedded in an image by increasing the luminance value of each pixel with a value C_(F)(n,k). A ‘−1’ is embedded by decreasing the luminance value of each pixel with C_(F)(n,k). Herein, n is the spatial coordinate of a pixel within frame k. More mathematically, we have:

${F_{W}\left( {\underset{\_}{n},k} \right)} = \left\{ \begin{matrix} {{F\left( {\underset{\_}{n},k} \right)} + {C_{F}\left( {\underset{\_}{n},k} \right)}} & {{{if}\mspace{14mu} w\left\lfloor {k/T} \right\rfloor} = 1} \\ {{F\left( {\underset{\_}{n},k} \right)} - {C_{F}\left( {\underset{\_}{n},k} \right)}} & {{{if}\mspace{14mu} w\left\lfloor {k/T} \right\rfloor} = {- 1}} \end{matrix} \right.$ where F is the frame to be embedded and F_(w) is the embedded frame. The change C_(F) is chosen to be such that the watermark is not visible, and therefore depends on F. In [2], a texture detector and a motion detector are used to determine C_(F.) As a result, the mean luminance values of the watermarked video f_(w)

${f_{w}(k)} = {\frac{1}{N}{\sum\limits_{\underset{\_}{n}}{F_{w}\left( {\underset{\_}{n},k} \right)}}}$ with N being the number of pixels per frame, will exhibit a change with respect to the original mean luminance values f _(w)(k)=f(k)+c _(F)(k)w(└k/T┘)   (1) Herein, c_(F) is the local depth of the watermark w, which is directly related to C_(F):

${c_{F}(k)} = {\frac{1}{N}{\sum\limits_{n}{C_{F}\left( {\underset{\_}{n},k} \right)}}}$

FIG. 4 shows, at (a), a graph of the mean luminance values of an original sequence and, at (b), a graph of the mean luminance values of an embedded sequence to visualize the watermark embedding concept.

Due to the watermark embedding, the mean luminance values will decrease or increase with respect to the original mean luminance values in time. See Eq. (1). In practice, the mean luminance values of a movie exhibit a low-pass nature. Therefore, the detector estimates these luminance values of the original unwatermarked movie by low-pass filtering the mean luminance values of the watermarked movie f_(w.) The detector estimates the watermark v by subtracting these low-pass filtered means from the mean luminance values of the watermarked movie f_(w), followed by a sign operation. More mathematically, v(k)=sign{f _(w)(k)−(f _(w) {circle around (x)}g)(k)}  (2) where {circle around (x)} denotes a (one-dimensional) convolution, and g is a low-pass filter. Since a watermark sample is embedded in K consecutive frames, this operation yields K estimates {tilde over (w)}₁ of the watermark w: {tilde over (w)} ₁(k)=v(1+kK),0≦1<K  (3)

Each of these K estimated watermarks {tilde over (w)}_(l) is correlated with the original watermark w. If the absolute correlation value d_(l =|<) w,{tilde over (w)} _(l) >| is larger than a threshold value, it is decided that the video sequence is watermarked.

A movie is projected progressively at a frame rate of 24 frames per second (fps), however, a standard video camera records at 25 fps (PAL) or 29.97 fps (NTSC) interlaced. Due to this interlacing, the luminance will not be the same throughout recording of one frame, as the shutter may just be opening or just be closing. Since the video camera and the projector are not synchronized, this problem is difficult to be solved for a camera man. In addition, since the frame rates of the projector and the video camera are not matched, a similar problem reveals itself; at some points in time, a frame is recorded when the shutter is half-open or even fully shut. The result of these mismatches is a flickering in the recorded movie.

A ‘de-flicker’ plug-in for Virtualdub can be found on the Internet. This plug-in removes the flickering in four steps:

-   In a first pass, it calculates the mean luminance values {circumflex     over (f)}_(w) of the movie; -   Subsequently, it filters these means {circumflex over (f)}_(w) with     a low-pass filter h (default is a simple averaging filter of length     12); -   Then it calculates factors β(k) between the original means     {circumflex over (f)}_(w)(k) and the filtered means for each frame:

${\beta(k)} = \frac{\left( {{\overset{\sim}{f}}_{w} \otimes h} \right)(k)}{{\overset{\sim}{f}}_{w}(k)}$

-   In a second pass, the luminance value of each pixel in frame k is     multiplied by the corresponding factor β(k) rounded to the nearest     integer, and clipped if it exceeds the maximum luminance value 255.

Note that β(k) is non-negative for all k, since {circumflex over (f)}_(w) (k)≧0 and h is a low-pass filter. If we ignore the rounding and the clipping for the moment, the result of these multiplications in the last step is that the means of the new constructed video {tilde over (f)}_(w,deflic) resembles the low-pass filtered means of {circumflex over (f)}_(w): {circumflex over (f)} _(w,deflic) =β(k){circumflex over (f)}(k)=({circumflex over (f)} _(w){circle around (x,)}h)(k)

Perceptually, this means that the new constructed video exhibits less flickering, because flickering can be seen as a high-frequency component in the mean luminance values of the frames, which are now filtered out.

Unfortunately, the watermarking scheme is actually a flickering, although imperceptible. As a direct consequence, this ‘de-flicker’ plug-in removes the watermark. The watermark embedding scheme must thus be modified in such a way that it is robust to de-flickering. This is all the more true as this ‘de-flicker’ tool is widely used to undo the pirate copies from the flickering.

To this end, each frame is divided into two parts (e.g. left/right or top/bottom), and the watermark sample is embedded in these parts in an opposite way. To embed a watermark sample, the mean luminance of one part is increased and the mean luminance of the other part is decreased. Consequently, the mean luminance values f_(w) of the watermarked movie now consist of two parts f _(w)(k)=f _(w,l)(k)+f _(w,2)(k) with (cf. Eq. (1)) f _(w,1)(k)=f ₁(k)+c _(F,1)(k)w(└k/T┘) and f _(w,2)(k)=f ₂(k)−c _(F,2)(k)w(└k/T┘)  (4)

After capturing with a camera, the ‘de-flicker’ tool removes the flickering by low-pass filtering the mean luminance values {circumflex over (f)}_(w): {circumflex over (f)} _(w,deflic)(k)=β(k)└{circumflex over (f)} _(w,1)(k)+{circumflex over (f)} _(w,2)(k)┘

The detection of the watermark for the modified watermarking scheme is similar to the detection method described above. First, the detector estimates the luminance values of the original unwatermarked movie for both parts by low-pass filtering the mean luminance values of both parts. Then it subtracts the result of both operations from the luminance values of the corresponding parts. Finally, it makes an estimate of the watermark {tilde over (v)} by subtraction followed by a sign operation (cf. Eq. (2)): {tilde over (v)}(k)=sign{(f _(w,1,defic)(k)−(f _(w,1,deflic) {circle around (x)}g)(k)−(f _(w,2,defic)(k)−(f _(w,2,deflic){circle around (x)}g)(k))}  (5)

The K estimations {tilde over (w)}₁ (see Eq. (3)) are obtained in a similar way and correlated with the watermark w.

The embedded watermark survives the de-flicker operation because the de-flicker tool multiplies all the pixels of an image by the same factor β(k), thereby leaving the luminance differences between the two image areas substantially intact The effect of the invention can also be explained more mathematically. It is assumed that the original unwatermarked movie exhibits a low-pass nature. After capturing of the movie with a camera, the mean luminance values of the watermarked parts {circumflex over (f)}_(w,1) and {circumflex over (f)}_(w,2) exhibit a flickering {circumflex over (f)} _(w,1,)(k)=γ(k)f _(w,1)(k) and {circumflex over (f)} _(w,2)(k)=γ(k)f _(w,2)(k) Herein, γ(k)>0 corresponds to the change in the mean luminance value (the flickering) of frame k. The ‘de-flicker’ plug-in removes this flickering by low-pass filtering the mean luminance values {circumflex over (f)} _(w,deflic)(k)=β(k){circumflex over (f)} _(w)(k)=β(k)γ(k)[f(k)+{c _(F,1)(k)−c _(f,2)(k)}w(└k/T┘)]≈f(k). From this expression it follows that

${{\beta(k)}{\gamma(k)}} \approx \frac{f(k)}{{f(k)} + {\left\{ {{c_{F,1}(k)} - {c_{F,2}(k)}} \right\}{w\left( \left\lfloor {k/T} \right\rfloor \right)}}}$ Since in practice {c _(F,1)(k)−c _(F,2)(k)}w(└k/T┘) is relatively small compared to f_((k)), we can approximate β(k)γ(k) by 1. By using this approximation, we see that {circumflex over (f)} _(w,l,deflic) (k)=β(k)γ(k)f _(w,l)(k)≈f _(w,l)(k)=f _(l)(k)+c _(F,l)(k)w(└k/T┘) For the other part, we obtain a similar result {circumflex over (f)} _(w,2,deflic)(k)≈f ₂(k)−c _(F,2)(k)w(└k/T┘)

Using these results, we finally obtain the following expression for {tilde over (v)}(see Eq. (5))

${{\overset{\sim}{v}(k)} \approx {{sign}\left\{ {{f_{1}(k)} - {f_{2}(k)} + {\left\lfloor {{c_{F,1}(k)} + {c_{F,2}(k)}} \right\rfloor{w\left( \left\lfloor {k/T} \right\rfloor \right)}} - \left\lbrack {{f_{1}(k)} - {f_{2}(k)}} \right\rbrack} \right\}}} = {{{sign}\left\{ {\left\lbrack {{c_{F,1}(k)} + {c_{F,2}(k)}} \right\rbrack{w\left( \left\lfloor {k/T} \right\rfloor \right)}} \right\}} = {{sign}\left\{ {w\left( \left\lfloor {k/T} \right\rfloor \right)} \right\}}}$ where it is assumed that the low-pass filter g completely filters out the watermark. Note that c_(F,l)(k)+c_(F,2)(k) does not influence the sign of the expression, because it is non-negative for all k. It can be seen from this expression that the watermark indeed survives the ‘de-flicker’ operation after the modification.

Methods and arrangements are disclosed for embedding and detecting a watermark in a cinema movie, such that the watermark can be detected in a copy made by a handheld video camera. The watermark embedder divides each image frame into two areas. A watermark bit ‘+1’ is embedded in a flame by increasing the luminance of the first part and decreasing the luminance of the second part. A watermark bit ‘−1’ is embedded by decreasing the luminance of the first part and increasing the luminance of the second part. It is achieved with the invention that the embedded watermark survives ‘de-flicker’ operations that are often used to remove flicker caused by the different frame rates of cinema projection equipment and consumer camcorders. 

1. A method of embedding a watermark in a motion image signal, the motion image signal comprising a sequence of images, the method comprising: representing the watermark by a sequence of watermark samples each having a value; dividing an image of the sequence of images into at least a first and a second image area; modifying the image in accordance with a watermark sample from the sequence of watermark samples, the modification comprising adding the watermark sample value to the luminance value of pixels of the first image area and subtracting the watermark sample value from the luminance value of pixels of the second image area to oppositely modify the mean luminances of the first and second image areas in accordance with the watermark sample; and for each of the sequence of watermark samples, repeating the dividing an image of the sequence of images and modifying of the image in accordance with a watermark sample from the sequence of watermark samples such that each watermark sample oppositely modifies the mean luminances of the first and second image areas of a different image of the sequence of images until the entire watermark is embedded.
 2. The method of claim 1, wherein the image modifying comprises modifying a series of consecutive images in accordance with the same watermark sample.
 3. The method of claim 1, wherein the first and second image areas are the upper and lower of an image halves, respectively.
 4. The method of claim 1, wherein the first and second image areas are the left and right of an image halves, respectively.
 5. An arrangement for embedding a watermark in a motion image signal, the motion image signal comprising a sequence of images, the arrangement comprising: means for representing the watermark by a sequence of watermark samples each having a value; means for dividing an image of the sequence of images into at least a first and a second image area; and image modifying means being arranged for modifying the image in accordance with a watermark sample from the sequence of watermark samples, the modification comprising adding the watermark sample value to the luminance value of pixels of the first image area and subtracting the watermark sample value from the luminance value of pixels of the second image area to oppositely modify the mean luminances of the first and second image areas in accordance with the watermark sample, wherein, the modifying means is further arranged to repeat the dividing an image of the sequence of images and modifying of the image for each of the sequence of watermark samples in accordance with a watermark sample from the sequence of watermark samples such that each watermark sample oppositely modifies the mean luminances of the first and second image areas of a different image of the sequence of images until the entire watermark is embedded. 